AITopics | bev semantic segmentation

Collaborating Authors

bev semantic segmentation

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models

Li, Siyu, Teng, Fei, Cao, Yihong, Yang, Kailun, Li, Zhiyong, Wang, Yaonan

arXiv.org Artificial IntelligenceJul-8-2025

Our approach is motivated by the potential of leveraging noisy synthetic data from driving world models to enhance BEV semantic segmentation. The proposed method investigates a noise-resilient learning framework designed for handling synthetic data with inherent noise. The generated data from different world models exhibits inconsistent road structures at identical viewpoints. Abstract --Birds' Eye View (BEV) semantic segmentation is an indispensable perception task in end-to-end autonomous driving systems. Unsupervised and semi-supervised learning for BEV tasks, as pivotal for real-world applications, underperform due to the homogeneous distribution of the labeled data. In this work, we explore the potential of synthetic data from driving world models to enhance the diversity of labeled data for robustifying BEV segmentation. Y et, our preliminary findings reveal that generation noise in synthetic data compromises efficient BEV model learning. T o fully harness the potential of synthetic data from world models, this paper proposes NRSeg, a noise-resilient learning framework for BEV semantic segmentation. Specifically, a Perspective-Geometry Consistency Metric (PGCM) is proposed to quantitatively evaluate the guidance capability of generated data for model learning. This metric originates from the alignment measure between the perspective road mask of generated data and the mask projected from the BEV labels. This work was supported in part by the National Natural Science Foundation of China (No. U21A20518, No. 61976086, and No. 62473139) and in part by the Open Research Project of the State Key Laboratory of Industrial Control Technology, China (Grant No. ICT2025B20). Wang are with the School of Robotics and the National Engineering Research Center of Robot Visual Perception and Control Technology, Hunan University, Changsha 410082, China (email: kailun.yang@hnu.edu.cn; Cao is with the Key Laboratory of Big Data Research and Application for Basic Education, Hunan Normal University, Changsha 410006, China.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2507.04002

Country:

Asia > China (0.84)
Asia > Singapore > Central Region > Singapore (0.04)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology (0.48)
Transportation > Ground > Road (0.35)
Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Neural World Models for Computer Vision

Hu, Anthony

arXiv.org Artificial IntelligenceJun-15-2023

Humans navigate in their environment by learning a mental model of the world through passive observation and active interaction. Their world model allows them to anticipate what might happen next and act accordingly with respect to an underlying objective. Such world models hold strong promises for planning in complex environments like in autonomous driving. A human driver, or a self-driving system, perceives their surroundings with their eyes or their cameras. They infer an internal representation of the world which should: (i) have spatial memory (e.g. occlusions), (ii) fill partially observable or noisy inputs (e.g. when blinded by sunlight), and (iii) be able to reason about unobservable events probabilistically (e.g. predict different possible futures). They are embodied intelligent agents that can predict, plan, and act in the physical world through their world model. In this thesis we present a general framework to train a world model and a policy, parameterised by deep neural networks, from camera observations and expert demonstrations. We leverage important computer vision concepts such as geometry, semantics, and motion to scale world models to complex urban driving scenes. First, we propose a model that predicts important quantities in computer vision: depth, semantic segmentation, and optical flow. We then use 3D geometry as an inductive bias to operate in the bird's-eye view space. We present for the first time a model that can predict probabilistic future trajectories of dynamic agents in bird's-eye view from 360{\deg} surround monocular cameras only. Finally, we demonstrate the benefits of learning a world model in closed-loop driving. Our model can jointly predict static scene, dynamic scene, and ego-behaviour in an urban driving environment.

bev semantic segmentation, kullback-leibler divergence, pattern analysis and machine intelligence, (14 more...)

arXiv.org Artificial Intelligence

2306.09179

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.13)
(6 more...)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Transportation > Ground > Road (1.00)
Leisure & Entertainment > Games (1.00)
Health & Medicine > Therapeutic Area (1.00)
(7 more...)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(2 more...)

Add feedback